Multiple Imputation for Continuous and Categorical Data: Comparing Joint and Conditional Approaches
نویسندگان
چکیده
We consider the relative performance of two common approaches to multiple imputation (MI): joint MI, in which the data are modeled as a sample from a joint distribution; and conditional MI, in which each variable is modeled conditionally on all the others. Implementations of joint MI are typically restricted in two ways: first, the joint distribution of the data is assumed to be multivariate normal, and second, in order to use the multivariate normal distribution, categories of discrete variables are assumed to be probabilistically constructed from continuous values. We use simulations to examine the implications of these assumptions. For each approach, we assess (1) the accuracy of the imputed values, and (2) the accuracy of coefficients and fitted values from a model fit to completed datasets. These simulations consider continuous, binary, ordinal, and unordered-categorical variables. One set of simulations ∗Corresponding author: [email protected]. We thank Yu-sung Su, Yajuan Si, Sonia Torodova, Jingchen Liu, and Michael Malecki, and two anonymous reviewers for their comments. An earlier version of this study was presented at the Annual Meeting of the Society for Political Methodology, Chapel Hill, NC, July 20, 2012.
منابع مشابه
چند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملMultiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence
We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial distributions for categorical variables with Dirichlet process mixtures of multivariate normal distributions for continuous variables. We inco...
متن کاملMultiple imputation of discrete and continuous data by fully conditional specification.
The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and ful...
متن کاملRelative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.
Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional sp...
متن کامل